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Abstract — We study the capacity of the discrete-time Gaussian 
channel when its output is quantized with a one-bit quantizer. 
We focus on the low signal-to-noise ratio (SNR) regime, where 
communication at very low spectral efficiencies takes place. 
In this regime a symmetric threshold quantizer is known to 
reduce channel capacity by a factor of 2/-K, i.e., to cause an 
asymptotic power loss of approximately two decibels. Here it is 
shown that this power loss can be avoided by using asymmetric 
threshold quantizers and asymmetric signaling constellations. We 
prove that, in order to avoid this power loss, flash-signaling 
input distributions are essential. Consequently, one-bit output 
quantization of the Gaussian channel reduces spectral efficiency. 

Threshold quantizers are not only asymptotically optimal: as 
we prove, at every fixed SNR a threshold quantizer maximizes 
capacity among all one-bit output quantizers. 

\ The picture changes on the Rayleigh-fading channel. In the 
noncoherent case we show that a one-bit output quantizer causes 
an unavoidable low-SNR asymptotic power loss. In the coherent 
case, however, this power loss is avoidable provided that we allow 

\ the quantizer to depend on the fading level. 

] Index Terms — Capacity per unit-energy, channel capacity, 
Gaussian channel, low signal-to-noise ratio (SNR), quantization. 



I. Introduction 

WE study the effect on channel capacity of quantizing 
the output of the discrete-time average-power-limited 
Gaussian channel using a one-bit quantizer This problem 
arises in communication systems where the receiver uses 
digital signal processing techniques, which require the analog 
received signal to be quantized using an analog-to-digital 
converter (ADC). For ADCs with high resolution, the effects of 
quantization are negligible. However, using a high-resolution 
ADC may not be practical, especially when the bandwidth 
of the communication system is large and the ADC therefore 
needs to operate at a high sampling rate [1]. In this case a 
low-resolution ADC must be employed. The capacity of the 
discrete-time Gaussian channel with one-bit output quantiza- 
tion indicates what communication rates can be achieved when 
the receiver employs a low-resolution ADC. 

We focus on the low signal-to-noise ratio (SNR) regime, 
where communication at very low spectral efficiencies takes 
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place, as in Spread-Spectrum and Ultra- Wideband communica- 
tions. In this regime, a symmetric threshold quantizer' reduces 
the capacity by a factor of 2/tt, corresponding to a 2dB 
power loss [2]. Hence the rule of thumb that "hard decisions 
cause a 2dB power loss." Here we demonstrate that if we 
allow for asymmetric threshold quantizers with corresponding 
asymmetric signal constellations, then the two decibels can be 
fully recovered. 

The above result shows that a threshold quantizer is asymp- 
totically optimal as the SNR tends to zero. We further show 
that this is not only true asymptotically: for any fixed SNR, 
we show that, among all one-bit output quantizers, a threshold 
quantizer is optimal. 

Furthermore, we show that the low-SNR asymptotic capac- 
ity can be achieved only hy flash-signaling input distributions 
[3, Def. 2]. For the Gaussian channel (without output quanti- 
zation), it was demonstrated by Verdii that such distributions 
result in a poor spectral efficiency [3, Th. 16]. Since output 
quantization cannot increase the spectral efficiency, it follows 
that flash signaling also results in a poor spectral efficiency 
on the quantized Gaussian channel. Thus, in the low-SNR 
regime, the Gaussian channel with optimal one-bit output 
quantization has a poor spectral efficiency. In contrast, the 
low-SNR asymptotic capacity of the unquantized Gaussian 
channel can also be achieved by input distributions that are 
not flash signaling, so the Gaussian channel has a much higher 
spectral efficiency than its quantized version [3]. Thus, while 
quantizing the output of the Gaussian channel with a one- 
bit quantizer does not cause a loss with respect to the low- 
SNR asymptotic capacity, it does cause a significant loss with 
respect to the spectral efficiency. 

It should be noted that the considered discrete-time channel 
model implicitly assumes that the channel output is sampled 
at Nyquist rate. While sampling the output at Nyquist rate 
incurs no loss in capacity for the additive white Gaussian noise 
(AWGN) channel [4], [5], it is not necessarily optimal (with 
respect to capacity) when the channel output is first quantized 
using a one-bit quantizer In fact, for a symmetric threshold 
quantizer, sampling the output above the Nyquist rate increases 
the low-SNR asymptotic capacity [6], [7] and it increases the 
capacity in the noiseless case [8], [9]. 

The rest of the paper is organized as follows. Section II 
introduces the channel model and defines the capacity as well 
as the capacity per unit-energy. Section III presents the main 
results of our paper. Section IV demonstrates that the capacity 
per unit-energy can be achieved by a pulse-position modula- 
tion (PPM) scheme. Section V discusses the implications of 
our results on the spectral efficiency. Section VI studies the 

'a threshold quantizer produces 1 if its input is above a threshold, and 
it produces if it is not. A symmetric threshold quantizer is a threshold 
quantizer whose threshold is zero. 
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Figure 1. System model. 



effect on the capacity per unit-energy of quantizing the output 
of the Rayleigh-fading channel using a one-bit quantizer. 
Sections VII through X contain the proofs of our results: 
Section VII contains the proofs concerning channel capacity. 
Section VIII contains the proofs concerning the capacity per 
unit-energy. Section IX contains the proofs concerning peak- 
power-limited channels, and Section X contains the proofs 
concerning Rayleigh-fading channels. Section XI concludes 
the paper with a summary and a discussion of our results. 

II. Channel Model and Capacity 

We consider the discrete-time communication system de- 
picted in Figure 1. A message M, which is uniformly dis- 
tributed over the set {1, 2, . . . , M}, is mapped by an encoder 
to the length-n real sequence Xi , X2 , • ■ • , Xn G M of channel 
inputs. (Here M denotes the set of real numbers.) The channel 
corrupts this sequence by adding white Gaussian noise to 
produce the unquantized output sequence 



Yk=Xk + Zk, keZ 



(1) 



where {Z^, fc e Z} is a sequence of independent and iden- 
tically distributed (i.i.d.) Gaussian random variables of zero 
mean and variance <t^. (Here Z denotes the set of integers.) 
The unquantized output sequence is then quantized using a 
quantizer that is specified by a Borel subset T> of the reals; it 
produces 1 if Yk is in V and produces if it is not. Denoting 
the time-A; quantizer output by Y^, 



Y,, 



if Yk e V, 
if Yk i V. 



While in this paper we only consider deterministic quantizers, 
it should be noted that our results continue to hold if we 
allow for randomized quantization rules, i.e., if the quantizer 
produces Yk according to some probability distribution Py\y 
with binary Y . In view of the direct relationship between the 
set T) and the quantizer it defines, we shall sometimes abuse 
notation and refer to T) as the quantizer An example of a one- 
bit quantizer is the threshold quantizer which corresponds to 
the set 

P = {y e M: y > T}, T G R. (2) 

The decoder observes the quantizer's outputs Y\^Yi-, ■ ■ . lYn 
and guesses which message was transmitted. We impose an 
average-power constraint on the transmitted sequence: for 
every realization of A/, the sequence xi, X2, . . . , x„ must 
satisfy 



fc=i 



for some positive constant P that we call the maximal- allowed 
average-power. 



For a fixed quantizer D and maximal-allowed average- 
power P, the capacity C(P,2?) is [5], [10] 



C(P,2?) 



sup I(X\Y) 

E[A'21<P 



(4) 



where the supremum is over all distributions of X under 
which the second moment of X does not exceed P. Here and 
throughout the paper we omit the time indices where they are 
immaterial. 

We say that a rate R (in nats per channel use) is achievable 
using power P and one-bit quantization if for every e > 
there exists an encoder satisfying (3) and 

n 

as well as a one-bit quantizer and a decoder such that the 
probability of error Pr(Af 7^ AI) tends to zero as n tends to 
infinity. Here log( ) denotes the natural logarithm function. 
The capacity C(P) is the supremum of all achievable rates 
and is given by 



C(P) =supC(P,I?) 

V 

= sup I{X; Y) 



(5) 
(6) 



where the first supremum is over all quantization regions V, 
and the second is over all quantization regions V and over all 
distributions of X satisfying E [X'^] < P. 

Following [11] we define the capacity per unit-energy of 
the quantizer V as follows: We say that a rate per unit-energy 
R{0, T)) (in nats per energy) is achievable with the quantizer V 
if for every e > there exists an encoder satisfying 



fe=i 



x\ < E, for every realization of M 



and 



logM 



> R{0,V) - e 



(7) 



(8) 



together with a decoder such that the probability of error 
Pr(A/ 7^ M) tends to zero as E tends to infinity. The capacity 
per unit-energy (7(0,2?) is the supremum of all achievable 
rates per unit-energy with the quantizer V and is given by 
[11, Th. 2] 

C{?,V) 



C(0,I?) = sup. 

P>0 r 



lim 

p^o 



C(P,2?) 



(9) 



(10) 



where the second equation follows because, for every V, the 
capacity C(P,2?) is a concave function of P. 

The definition of capacity per unit-energy using a one-bit 
quantizer is analogous. We say that a rate per unit-energy R{0) 
(in nats per energy) is achievable using a one-bit quantizer if 
for every e > there exists an encoder satisfying (7) and 

log M 



> R{0) - e 



(11) 



as well as a one-bit quantizer and a decoder such that the 
probabihty of error Pr(Af 7^ AI) tends to zero as E tends to 
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infinity. The capacity per unit-energy (7(0) is the supremum 
of all achievable rates per unit-energy. 

Extending the proof of Theorem 2 in [11] to account for 
the additional maximization over all possible quantizers, we 
obtain 

C(P) 



C(0) = sup. 

P>0 r 

which, by (5), can be expressed as 

(7(0) = sup sup '^^'"'•^^ 

P>0 V P 



(12) 



(13) 



Exchanging the order of the suprema and applying (9) thus 
yields 



C(0) = supC(0,I?) 

V 



(14) 



D{PY\x=i II Py\x^^) 
sup — — ■ ^ ■ (15) 



e 



where the last step follows from [3, Th. 3]. Here -D(-|| ) 
denotes relative entropy 



D(P\\Q) 



log if UP, 



if P< Q 
otherwise 



(16) 



(where P Q indicates that P is absolutely continuous with 
respect to Q), and Py\x=x denotes the output distribution 
corresponding to the input x. In our case, since the output of 
the quantizer is binary, 

D{PYix=i II Py\x=o) 



Pi{Y e P I X = ^) log 



Pr(y e V 


X 


=i) 


Pr(f € V 


X 


= 0) 



Pr(y iv\x^e)\og 



Pr(f (^V\X = i) 
Pr(y ^ I? I X = 0) ' 

It further follows from (5) and (10) that 

C(P) C(P,P) 

lim = hm sup 

p;o P PiO V P 

> sup (7(0, 2?) (17) 

V 

which together with (12) and (14) yields 

C(0)=lim^l^. (18) 

Thus, the capacity per unit-energy is equal to the slope at zero 
of the capacity-vs-power curve. 

By the Data Processing InequaUty [10, Th. 2.8.1], C{V,V) 
is upper-bounded by the capacity of the unquantized channel 
[4] 

C(P,I?) < ilogfl + ^y (19) 



Consequently, by (10) and (14) 



C(0,P)<^ and (7(0) 



(20) 



A ubiquitous quantizer is the symmetric threshold quantizer, 
for which P = {?;GM:y>0}. For this quantizer the 
capacity C,y^{V) is given by [12, Th. 2], [2, Eq. (3.4.18)] 



aym(P) = l0g2-iJb 




(21) 



where -ffh(-) denotes the binary entropy function 

Hb{p)^-p\ogp-{l-p)\og{l-p), 0<p<l (22) 
(where we define log = 0) and Q{-) denotes the (^-function 

Q{x)-- 



27r Jx 



e At, a; G 



(23) 



The capacity Csyni(P) can be achieved by transmitting Vp and 
— \/P equiprobably. 

From (21), the capacity per unit-energy C'sym(O) for a sym- 
metric threshold quantizer can be computed as [2, Eq. (3.4.20)] 



A IP,\ 1- (7sym(P) 1 
(7sym(0) = lim = -. 



(24) 



This is a factor of 2/7r smaller than the capacity per unit- 
energy 1/(2(7^) of the Gaussian channel without output quan- 
tization. Thus, quantizing the channel output by a symmetric 
threshold quantizer causes a loss of roughly 2dB. 

It is tempting to attribute this loss to the fact that the 
quantizer discards information on the received signal's mag- 
nitude and allows the decoder to perform only hard-decision 
decoding. However, as we shall see, the loss of 2dB is 
not a consequence of the hard-decision decoder but of the 
suboptimal quantizer In fact, with an asymmetric threshold 
quantizer the loss vanishes (Theorem 2). 

III. Main Results 

Our main results are presented in the following two sections. 
Section III-A presents the results concerning channel capacity. 
We show that the capacity-achieving input distribution is 
discrete with at most three mass points and that threshold 
quantizers achieve the capacity (Theorem 1). Furthermore, 
we provide an expression for the capacity when the average- 
power constraint (3) is replaced by a peak-power constraint 
(Proposition 1). 

Section III-B presents the results on the capacity per unit- 
energy. We show that with an asymmetric threshold quantizer 
and asymmetric signal constellations, the capacity per unit- 
energy of the Gaussian channel can be achieved (Theorem 2), 
thus demonstrating that quantizing the output of the Gaussian 
channel with a one-bit quantizer does not cause an asymptotic 
power loss. We further demonstrate that flash-signaling input 
distributions [3, Def. 2] are required in order to achieve this 
capacity per unit-energy (Theorem 3). Finally, we show that 
when the average-power constraint (3) is replaced by a peak- 
power constraint, then quantizing the output of the Gaussian 
channel with a one-bit quantizer causes a 2dB power loss 
(Proposition 2). 
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A. Channel Capacity 

Theorem 1 ( Optimal Input Distribution and Quantizer): 

1) For any given maximal-allowed average-power P and 
any Borel set V, the supremum in (4) defining C(P,I?) 
is achieved by some input distribution that is concen- 
trated on at most three points. 

2) For any given maximal-allowed average-power P the 
supremum in (6) is achieved by some threshold quantizer 

V* = {y eR:y>T} 

(where T > depends on P and a^) and by a zero- 
mean, variance-P input distribution that is concentrated 
on at most three points. 
Proof: See Section VII. ■ 
The result that the capacity-achieving input distribution is con- 
centrated on at most three points is consistent with Theorem 1 
in [12], which shows that if the quantization regions of a 
K-bit quantizer partition the real line into 2'^ intervals, then 
the capacity-achieving input distribution is concentrated on at 
most 2*^ + 1 points. 

Proposition 1: If the average-power constraint (3) is re- 
placed by the peak-power constraint 



< P, fc e Z, 



with probability one 



(25) 



then the capacity of the channel presented in Section II is 
given by 



C, 



'pp(P) = maxJ log(l + e-®(P^^)) 




where 



e(p,T) = 



1 - 



(27) 



The capacity can be achieved by a binary input distribution 
with mass points at \/P and — \/P and by some threshold 
quantizer with threshold T > 0. 

Proof: See Section IX-A. ■ 
Numerical evaluation of (26) suggests that, for every maximal- 
allowed peak-power P, the maximum is attained for T = 0. 
In this case Cpp(P) specializes to the capacity of the average- 
power-limited Gaussian channel with symmetric output quan- 
tization (21). Thus, if the channel is peak-power limited, then 
a symmetric threshold quantizer achieves capacity. 

B. Capacity Per Unit-Energy 

Theorem 2 (C{0) = l/{2a^)): The capacity per unit- 
energy of the channel presented in Section II is 



(28) 



Proof: See Section VIII-A. ■ 
Thus, if we allow for asymmetric threshold quantizers and 
asymmetric signal constellations, then quantizing the output 



of the average-power-limited Gaussian channel with a one-bit 
quantizer does not cause a loss with respect to the capacity 
per unit-energy. 

Considering the symmetry of the probability density func- 
tion (PDF) of the Gaussian noise, it is perhaps surprising 
that an asymmetric quantizer yields a larger rate per unit- 
energy than a symmetric one. However, the input distribution 
achieving (28) is asymmetric (see below). Hence, the PDF 
of the unquantized channel output is asymmetric, and it 
seems therefore plausible that the capacity per unit-energy 
is achieved for some asymmetric quantizer In fact, even if 
the PDF of the unquantized channel output were symmetric, 
this would not necessarily imply that the optimal quantizer is 
symmetric. There exist symmetric PDFs for which the optimal 
one-bit quantizer with respect to the mean squared error is 
asymmetric, see, e.g., [13, Ex. 5.2, p. 64-65]. 

Theorem 2 is proved by analyzing (15) for a judicious 
choice of V and ^. In Section IV we provide an alternative 
proof by presenting a PPM scheme that achieves the capacity 
per unit-energy (28). For this scheme, the error probability 
can be analyzed directly using the Union Bound and an 
upper bound on the Q-function: there is no need to resort to 
conventional methods used to prove coding theorems, such 
as the method of types, information-spectrum methods, or 
random coding exponents. 

The capacity per unit-energy (28) can be achieved by 
binary on-off keying, i.e., by binary inputs of probability mass 
function 

P(X = = 1-P(^ = 0) = ^, (29) 

where the absolute value of the nonzero mass point |^| tends 
to infinity as P tends to zero. The distribution of such inputs 
belongs to the class of flash-signaling input distributions, 
which was defined by Verdii [3, Def. 2] as follows. 

Definition 1 (Flash Signaling): A family of distributions 
of X parametrized by P is said to be flash signaling if it 
satisfies E [X^] < P and for every positive v 



lim -1 i = 1. 

p;o P 



(30) 



Here I {statement} denotes the indicator function: it is equal 
to one if the statement between the curly brackets is true and 
is equal to zero otherwise. 

Flash signaling is described in [3] as "the mixture of a 
probability distribution that asymptotically concentrates its 
mass at and a probability distribution that migrates to 
infinity; the weight of the latter vanishes sufficiently fast to 
satisfy the vanishing power constraint." The next theorem 
shows that flash signaling is necessary to achieve (28). 

Theorem 3 (Flash Signaling Is Required to Achieve C{0)): 
Every family of distributions of X parametrized by P that 
satisfies E [X^] < P and 



lim ■ 

PiO 



(31) 



must be flash signaling. 

Proof: See Section VIII-B. 
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It is easy to show that for flash-signaling input distributions, 
threshold quantizers with a bounded threshold give rise to a 
zero rate per unit-energy. We thus have the following corollary. 

Corollary 1 (The Thresholds Must Be Unbounded): If (31) 
holds for some family of threshold quantizers (parametrized 
by the average power), then the thresholds must be unbounded 
in the average power. 

Proof: See Section VIII-C. ■ 

It can be shown that, for binary on-off keying where the 
nonzero mass point tends to infinity as P tends to zero, and 
for threshold quantizers where the threshold grows to infinity 
sufficiently slowly as P tends to zero, the quantized Gaussian 
channel is equivalent to a binary asymmetric channel with 
vanishing crossover probabilities 

limPrfr = 1\X = 0)= limPr(y = OIX = 0=0. (32) 

Thus, as P tends to zero, X can be accurately guessed from the 
quantized output Y . On the basis of this observation and of the 
observation that binary on-off keying achieves the capacity per 
unit-energy of the Gaussian channel irrespective of the location 
of the nonzero mass point [11] (see also [3]), it may seem 
plausible that quantizing the output of the Gaussian channel 
with a one-bit quantizer does not cause a loss with respect to 
capacity per unit-energy. Note, however, that the same line of 
argument could also be applied to the averaged-power-limited 
Rayleigh-fading channel (see Section VI), but for this channel 
quantizing the output with a one-bit quantizer does cause a loss 
with respect to capacity per unit-energy (unless the receiver is 
cognizant of the realization of the fading) — see Theorems 4 
and 5. 

As mentioned in Section II, the capacity per unit-energy 
is equal to the slope at zero of the capacity-vs-power curve. 
Thus, Theorem 2 demonstrates that the first derivative of C(P) 
at P = is equal to 1/(2(7^). Theorem 3 implies that the 
second derivative of C(P) at P = is — oo. 

Corollary 2 ((7(0) ^ -oo): 



C(0) = 2 lim 



C(P)-PC(0) 



(33) 



Proof: By the Data Processing Inequality, we have for 
every family of distributions of X parametrized by P 



lim 



I{X-Y)- 



p 



< lim 

PiO 



I{X;Y)- 



(34) 



To achieve (7(0) it is necessary to use flash signaling (Theo- 
rem 3). And for all flash-signaling input distributions the right- 
hand side (RHS) of (34) is -oo ([3, Th. 16]). Consequently, 
so is its left-hand side (LHS). ■ 
Note that, for the Gaussian channel, the first and second 
derivative of the capacity are [4] 

C{0)^^ and Cc{0) = -^ (35) 

(where "G" stands for "Gaussian"). Thus, while quantizing 
the output of the Gaussian channel with a one-bit quantizer 
does not cause a loss with respect to the first derivative of 
the capacity-vs-power curve, it causes a substantial loss in 



terms of the second derivative. The impUcations on the spectral 
efficiency are discussed in Section V. 

Proposition 2: If the average-power constraint (3) is re- 
placed by the peak-power constraint 



fc G Z, with probability one 



(36) 



then the slope at zero of the capacity-vs-power curve is given 
by 

Cpp(P) _ _J_ 

r2 ■ 



lim ■ 

p;o P 



TTCT^ 



(37) 



Proof: See Section IX-B. ■ 
As was shown by Shannon [4], the capacity of the peak-power- 
limited unquantized Gaussian channel satisfies 



lim 

PJ-O 



Cg,pp(P) 



1 
2^ 



(38) 



Thus, in contrast to the average-power-limited case, quantizing 
the output of the peak-power-limited Gaussian channel with a 
one-bit quantizer does cause a 2dB power loss. 

IV. Pulse-Position Modulation 

We next demonstrate that the capacity per unit-energy 
(28) can be achieved using a PPM scheme; no random- 
coding arguments are needed. For each message m in 
{1,2,...,M}, the encoder produces the M channel inputs 
xi{m), X2{m), . . . , xm('7i), where 



Xk{m) = 



^ if fc = m, 
if fc 7^ m, 



fc = 1,...,M 



(39) 



and where = E. For a fixed rate per unit-energy 



log M 



we have 



logM 



i?(0) 



(40) 



Note that, while the rate per unit-energy is fixed, the rate of 



this scheme is 



logM 
M 



and tends to zero as M tends to infinity. 
We employ a threshold quantizer (2) with the threshold 
T chosen as follows. Given any < e < 1, we choose 
the threshold T so that the probability that the quantizer 
produces given the channel input ^ is equal to e, i.e.. 



which yields 

P{Yk = I Xfc = e) = e 

P{Y,^l\X, = ,)=Q{i^-^^) 



(41) 

(42a) 
(42b) 



Here Q~^{-) denotes the inverse (^-function. 

The decoder guesses "M = m" provided that y,„ = 1 and 
that Yi^ — Q for all k ^ m. If Y^, — 1 for more than one k, or 
if Yf; ~ for all k = 1, 2, . . . , M, then the decoder declares 
an error. 
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Suppose that Message AI ~ m was transmitted. Then the 
probabiHty of an error is upper-bounded by 

Pr (error | AI = rnj 

= Pr I U (Ffe - 1) U (F™ ^0) m 
< ^(^fc ^l\Xk=0)+ P{Y„, = I X„ = 
= P{Yk = 1 I Xfe = 0) + e 

k^ni 

= (M - 1) P(Yi = l\Xi=0)+e (43) 

where the second step follows from the Union Bound; the 
third step follows from (42a); and the fourth step fol- 
lows because the channel is memoryless which implies that 
Pr(yfc = l\Xk = 0) does not depend on k. Since the RHS of 
(43) does not depend on m, it follows that also the probability 
of error 

1 ^ 

Pr(M M) ^ — Y Pr (error | M m) 

?n — 1 

is upper-bounded by (43). 

The first term on the RHS of (43) can be evaluated using 
(42b) and (40): 

(M- l)P(yi 1 I Xi = 0) 



= (M-1) 



(M - 1) ( , ^^V ^ ^ I ^ (44) 



We continue by showing that if 

i?(0) < 



2a2 



then, for every fixed < e < 1, the RHS of (44) tends to zero 
as M tends to infinity. Indeed, 



M 



lim (M - 1) Q 



VloiM-aQ-^e)jRiO) 



< lim exp(a^R{0)(a + Q-^{e))^)Q{a) 

< lim -^^exp(a^R{0)(a + Q-\e)f ^-aA (45) 

where the first step follows by upper-bounding M — 1 < M 
and by substituting 



VtogM-^g-Me)Vi?(0) 



a^i?(0) 

and the second step follows from the inequality [14, 
Prop. 19.4.2] 

1 



Q{a) < 



-a' II 



Of > 0. 



(46) 



The RHS of (45) vanishes for i?(0) < 

Combining (45) with (43), we obtain that if i?(0) < 
then the probability of error tends to e as E — and hence, by 
(40), also M — tends to infinity. Since e can be chosen arbi- 
trarily small, the probability of error can be made arbitrarily 
small, thus proving that the capacity per unit-energy (28) is 
achievable with the above PPM scheme. 

The fact that PPM achieves the capacity per unit-energy 
of the Gaussian channel with a threshold quantizer follows 
also from the analysis of the probability of error for block 
orthogonal signals [15, p. 342-346]. The threshold a > 
introduced to bound the RHS of (5.97d) in [15] can be 
identified as the threshold T of the quantizer. 

V. Spectral Efficiency 

The discrete-time channel presented in Section II is closely 
related to the (continuous-time) AWGN channel with one-bit 
output quantization. Indeed, suppose that the input to the latter 
channel is bandlimited to W Hz and that its average-power is 
limited to P, and suppose that the Gaussian noise is of double- 
sided power spectral density No/2. Then, the discrete-time 
channel (1) with noise- variance 



= WNo 



(47) 



results from sampling the AWGN channel's output at the 
Nyquist rate 2W. The capacity (in bits per second) of the 
AWGN channel with Nyquist sampling and one-bit output 
quantization is given by 



C 



(2W) 
AWGN 



(P) 



2W 



C(P) 



(48) 



where C(P) is the capacity (6) of the discrete-time channel 
in nats per channel use. Note, however, that when the channel 
output is quantized, sampling at the Nyquist rate need not be 
optimal with respect to capacity: see, e.g., [6]-[9] for scenarios 
where sampling the quantizer's output above the Nyquist 
rate provides capacity gains. Consequently, C^^nIP) 
general, a lower bound on the capacity of the (continuous- 
time) AWGN channel with one-bit output quantization. 

The energy per information-bit when communicating with 
power P at rate C'^^n(P) defined as 



-b A 



1 



which, by (47) and (48), is equal to 

^ log 2 P 
No 2cr2 C'(P)' 



(49) 



(50) 



The spectral efficiency C(-) (in bits per second per Hz) is 
defined as 



/^(2W) ,p. 

w 



which, by (48), is 



(51) 



(52) 
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In (51) and (52), P is the solution to 

Eb P 1 



No d^^U?)^o' 



(53) 



See [3] for a more thorough discussion of spectral efficiency. 
(Note that, in contrast to (1), the channel considered in [3] 
is complex-valued. Therefore, the expressions for Eb/Np and 
(^(Eb/No) differ by a factor of two.) 

The minimum Eb/No required for reliable communication 
is determined by taking the infimum over P of the RHS of 
(50). By (12) this yields [3, Eq. (35)] 

M ^ log2 1 
Noj^n 2^2 (7(0)- 



(54) 



Furthermore, the slope of Eb/No C (Eb/No) at (Eb/No)inin 
in bits per second per Hz per 3dB is given by [3, Th. 9]^ 

4[C(0)]' 



So 



-C(0) 



(55) 



By (28) and (33), we have that for the average-power-limited 
Gaussian channel with one-bit output quantization 



(7(0) = At and (7(0) = -oo 

2(7^ 



which yields 



-f ) = log2 = -1.59 dB 
No, 



So =0 



bps /Hz 
3dB ■ 



(56) 



(57a) 



(57b) 



In comparison, for the unquantized Gaussian channel (35) 

C-gW-^ and ^-0(0) = -^ (58) 

and for the Gaussian channel with symmetric one-bit output 
quantization (21) 

1 ■■ 2/1 

— and Csym(O) = 



Csym(O) = 

This yields 



37rcr4 



- 1 



(59) 



Tt) =log2 = -1.59dB 

So,G 



bps/Hz 
3dB 



and 



No 



mm.sym 
So,sym 



■log 2 



6 



0.37 dB 
bps/Hz 



(60a) 
(60b) 

(61a) 
(61b) 



TT - 1 3dB 
Comparing (6 1 a) with (60a), we see once more that quantizing 
the output of the Gaussian channel with a symmetric threshold 
quantizer causes a power loss of roughly 2dB. We further 

^Again, the channel considered in [3] is complex-valued and the expressions 
for (Eb/No) and So therefore differ by a factor of two. Nevertheless, since 
the capacity of the complex-valued channel is twice the capacity of the real- 
valued channel, it follows that the numerical values of (Eb/No)^^j^^ and Sq 
are the same as in [3]. 
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Figure 2. Spectral efficiency versus energy per information-bit. The top figure 
shows the spectral efficiencies of the Gaussian channel with and without one- 
bit output quantization. The bottom figure compares the spectral efficiency for 
the optimal one-bit quantizer with that for the symmetric threshold quantizer. 



observe that with an asymmetric threshold quantizer we can 
recover the loss in terms of (Eb/No) ^^j^, but there is still 
a substantial loss in terms of spectral efficiency. Indeed, 
for the Gaussian channel with one-bit output quantization, 
the wideband slope So is zero, whereas for the unquantized 
Gaussian channel it is 2 bits per second per Hz per 3dB. 

The above spectral efficiencies are shown in Figure 2. The 
top subfigure shows the spectral efficiencies of the Gaussian 
channel with and without one-bit output quantization. The 
bottom subfigure compares the spectral efficiency C'(-) for 
the optimal one-bit quantizer with the spectral efficiency 
C'sym(-) for the symmetric threshold quantizer. We observe 
that, even though the minimum energy per information-bit is 
the same with and without one-bit output quantization,"* the 

^^For numerical reasons, the spectral efficiency of the Gaussian channel with 
one-bit output quantization can only be shown for E(,/No above — 0.5dB. 
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corresponding spectral efficiencies differ substantially for all 
Eb/No- We further observe that for spectral efficiencies above 
0.02 bits per second per Hz a symmetric threshold quantizer 
is nearly optimal. 

We conclude that, for communication systems that operate 
at very low spectral efficiencies — such as Spread-Spectrum or 
Ultra- Wideband systems — asymmetric quantizers are benefi- 
cial, although for most practical scenarios the potential power 
gain is significantly smaller than 2dB. For example, at a 
spectral efficiency of 0.001 bits per second per Hz, allowing 
for asymmetric quantizers with corresponding asymmetric 
signal constellations provides a power gain of roughly O.ldB. 

VI. One-Bit Quantizers for Fading Channels 

For the average-power-limited (real-valued) Gaussian chan- 
nel, we have demonstrated that by allowing for asymmetric 
threshold quantizers with corresponding asymmetric signal 
constellations, one can achieve the capacity per unit-energy 
of the unquantized channel. The same holds for the average- 
power-limited complex-valued Gaussian channel [16]: using 
binary on-off keying (29) and a radial quantizer (which 
produces 1 if the magnitude of the channel output is above 
some threshold and produces otherwise), one can achieve 
the capacity per unit-energy of the unquantized channel by 
judiciously choosing the threshold and the nonzero mass point 
as functions of the SNR. 

In this section we briefly discuss the effect of one-bit 
quantization on the capacity per unit-energy of the discrete- 
time, average-power-limited Rayleigh-fading channel. This 
channel's unquantized output Y^, is given by 



(62) 



where {Hk, k G 7^} and {Zk, k E Z} are independent 
sequences of i.i.d., zero-mean, circularly-symmetric, complex 
Gaussian random variables, the former with unit-variance and 
the latter with variance a^. We say that the channel is coherent 
if the receiver is cognizant of the realization of {Hk, k G Z} 
and that it is noncoherent if the receiver is only cognizant of 
the statistics of {Hk, k e Z}. 

The unquantized output Yk is quantized using a one-bit 
quantizer that is specified by a Borel subset V of the complex 
field C: it produces 1 if Yk is in V, and it produces if it is 
not. 

The capacities C(P, V) and C(P) are defined as in Sec- 
tion II but with the square replaced by the squared magnitude 
in the average-power constraint (3). Likewise, the capacities 
per unit-energy (7(0, 2?) and C{Q) are defined as in Section II 
but with the square replaced by the squared magnitude in the 
energy constraint (7). 

A. Coherent Fading Channels 

Using the same arguments as in Section II, it can be shown 
that, for a fixed quantizer V, we have for the coherent channel 
[11, Th. 3], [3] 



CiO,V) 



sup 



D{Py\h,x=( II Py\h,x=o I Ph) 



where • |-) denotes conditional relative entropy 

D{Py\h.x=(, II Py\h.x=o I Ph) 

= j D{PY\H=KX=i II PY\H=h,x=o) APnih); (64) 

Ph denotes the distribution of the fading H; and PY\H=h,x=x 
denotes the output distribution conditioned on {H, X) ~ 
{h, x). (This can be shown along the lines of the proof of 
Theorem 3 in [11] but with the mutual information I{X;Y) 
replaced by the conditional mutual information I{X;Y\H). 
That the RHS of (63) is an upper bound on 0(0,1)) follows 
then immediately from [11, Eq. (15)]. Showing that this holds 
with equality requires swapping the order of taking the limit 
as P tends to zero and of computing the expectation over the 
fading.) It can be further shown that 



m = sup ^("^"'"-"-^ 



Y\H,X=0 



Pi 



(65) 



By the Data Processing Inequality, the capacity per unit- 
energy is upper-bounded by that of the unquantized channel 
[17], [3] 



C(0) < ^ 



(66) 



We next show that, by choosing the one-bit quantizer as a 
function of H and the SNR, this upper bound can be achieved. 

Theorem 4 (Coherent Case): The capacity per unit-energy 
of the coherent Rayleigh-fading channel is given by 

C(0) = ^. (67) 

It can be achieved by a family of radial quantizers 
(parametrized by the average power) with thresholds that are 
proportional to \H\. 

Proof: See Section X-A. ■ 
Remark 1: The assumption that the fading H is Gaussian 
is not essential. In fact, in the coherent case, (67) holds for 
every fading distribution satisfying E = 1. 

B. Noncoherent Fading Channels 

Using the same arguments as in Section II, it can be shown 
that in the noncoherent case 

D{PY\x=i II Py|x=o) 



C(0) 



sup 



(68) 



(63) 



Since the capacity per unit-energy of the unquantized 
Rayleigh-fading channel equals 1/cr^ irrespective of whether 
the channel is coherent or not [17], [3], it follows from 
the Data Processing Inequality that (66) holds also in the 
noncoherent case. 

The capacity per unit-energy of the coherent channel with 
one-bit output quantization can be achieved using binary on- 
off keying where the nonzero mass point tends to infinity as 
the SNR tends to zero. This result might mislead one to think 
that (67) also holds in the noncoherent case. Indeed, in the 
absence of a quantizer, binary on-off keying with diverging 
nonzero mass point achieves the capacity per unit-energy l/a^ 
irrespective of whether the receiver is cognizant of the fading 
realization or not [3], [17]. It might therefore seem plausible 
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that also in the noncoherent case quantizing the channel output 
with a one-bit quantizer would cause no loss in the capacity 
per unit-energy. But this is not the case: 

Theorem 5 (Noncoherent Case): For the noncoherent 
Rayleigh-fading channel with one-bit output quantization 

C{0) < ^. (69) 

Proof: See Section X-B. ■ 
The capacity of fading channels with one-bit output quan- 
tization was also studied in [18]-[22]. However, in [18]-[21] 
the real and imaginary parts of Yk are quantized separately 
using a one-bit quantizer for each rather than quantizing Yk 
directly using a one-bit quantizer Furthermore, [18]-[21] do 
not maximize over all possible quantizers. 

VII. Proof of Theorem 1 
We prove Theorem 1 in 5 steps: 

1) We first show that for any given maximal-allowed 
average-power P and any Borel set T), the supremum 
in (4) defining C(P,2?) is achieved by some input 
distribution that is concentrated on at most three points 
(Section VII-A). 

2) We next show that for every three-mass-points input 
distribution, the supremum over all quantizers can be 
replaced with the supremum over all threshold quantiz- 
ers and all quantizers whose quantization region consists 
of a finite interval (Section VII-B). 

3) We continue by showing that the supremum in (6) 
defining C(P) is achieved (Section VII-C). 

4) We then show that threshold quantizers are optimal by 
demonstrating that quantization regions consisting of a 
finite interval are suboptimal (Section VII-D). 

5) We finally show that the capacity-achieving input distri- 
bution must be centered and must satisfy the average- 
power constraint with equality (Section VII-E). 

A. Input Distributions Consisting of Three Mass Points 

GeneraUzing the proof of Theorem 1 in [12] to arbitrary 
quantizers, we prove that for every fixed quantizer T> and 
maximal-allowed average-power P, the capacity C{?,T>) is 
achieved by an input distribution consisting of three (or fewer) 
mass points. To this end we first argue that we can introduce 
an additional peak-power constraint without reducing capacity, 
provided that we allow the maximal-allowed peak-power to 
tend to infinity. Thus, we show that C(P, 2?), which is defined 
in (4) without a peak-power constraint, can also be expressed 
as 

C(P,23)= lim sup l(Px,Wv) (70) 

|.Y|<A 

where Wv denotes the channel law corresponding to the 
quantization region T>, and where l{Px,Wxi) denotes the 
mutual information of a channel with law Wv when the input 
is distributed according to Px- Clearly, the RHS of (70) cannot 
exceed its LHS, because imposing an additional peak-power 



constraint cannot increase capacity. It remains to prove that 
the LHS cannot exceed the RHS. 

By Fano's inequality [10, Th. 2.11.1] and the Data Process- 
ing Inequality, we have that, for every blocklength n, every 
encoder m n- (a;i(TO), . . . , x„(m)) of rate R ~ that 
satisfies the average-power constraint, and every quantization 
region T>, the probability of error is lower-bounded by [10, 
Sec. 8.9] 

n 

Pr(M ^ > 1 - — ^ /(Xfc(M); F,.) - — . (71) 

k=l 

Let A„ be the largest magnitude of the symbols that the 
encoder can produce 

A„ = max |a;fc(m)| (72) 

l<fc<n, 
l<m<M 

so 

|xfc(m)|<A„, (fc = l,2,...,n,m = l,2,...,M). (73) 

With this notation we have for every blocklength n and every 
quantizer T>, 

1 " 

-y2l{Xk{M);Yk) < sup l{Px,Wv) 
" ^1 E[x^]<p. 

|X|<A„ 

< sup sup l{Px,Wv) (74) 

A>0e[x2]<p, 
|X|<A 

where the first inequality follows from (73) and by the 
concavity of 

sup l{Px,Wv). 

E[X'-']<P, 
\X\<A„ 

Thus, the RHS of (71) is bounded away from zero whenever 
R exceeds the RHS of (74), and the inequality 

C(P,I?)<sup sup l{Px,Wv) (75) 

A>0e[x2]<P, 
|A'|<A 

is established. Since the inner supremum on the RHS of (75) 
is monotonically nondecreasing in A, we can replace the outer 
supremum by a limit and thus establish (70). 

Introducing a peak-power constraint in (70) allows us next 
to establish the existence of a capacity-achieving input dis- 
tribution of three mass points using Dubins's Theorem as 
follows. Recall that by (70) 

C{?,V) = lim Cp.a(P) (76) 

A— s-oo 

where Cxi.a(P) denotes the capacity of the memoryless chan- 
nel Pr(y G I? I X = x) with the input X taking values in 
the interval [—A, A] and with the binary output Y: 

Ci,,a(P)= sup l{Px,Wv). (77) 

E[jf2]<P, 
|X|<A 

Proceeding along the lines of [23, Sec. II-C] but accounting 
for the additional average-power constraint, it can be shown 
that Cx),a(P) is achieved by an input distribution consisting 
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of three mass points. Indeed, since P i-^- Cd.a(P) is concave 
it is continuous, so there exists some P' < P such that 



Ci,,a(P)= sup I{Px.Wt,) 

E[X^] = P', 
|X|<A 



(78) 



The input distribution achieving Cd,a(P) must thus be con- 
centrated on the interval [—A, A] and additionally satisfy 



j x^APxix) = P'. 



(79) 



The arguments in [23, Sec. II-C] thus go through with the set A 
in [23, Sec. II-C] replaced by the set of input distributions that 
induce the given output distribution and that additionally he 
on the hyperplane (79). 

Having established that under an additional peak-power 
constraint capacity is achieved by a three-mass-points input 
distribution, we now study what happens to these three mass 
points as the allowed peak-power tends to infinity. We thus 
study how the three mass points at locations 

i ~ (Cl, Cm, £,r) 
with corresponding masses 

P = (pl,Pm,Pr) 

behave as A tends to infinity. 

By possibly considering a subsequence of peak powers, we 
can assume that, as A tends to infinity, ^ converges to some 
^* = (Cli^M'Cr) whose components are in the extended real 
line MU {±oo}. Likewise we can assume that p converges to 
some probability vector p*. Since the input distributions must 
satisfy the average-power constraint, if any of the components 
of ^* is ±oo, then the corresponding component of p* must 
be zero. By Lemma 1 (Appendix I), Pr(F S 'D\X = 
converges to Pr(y e = C^) whenever € M, and the 
continuity of 

Ci,,a(P) =Htl J2 eV\X = ^e)] 

\ee{LM,R} / 

fe{L,M,R} 

demonstrates that liniA^oo C'x),a(P) (which equals C(P,2?) 
by (70)) equals the mutual information corresponding to 
(p*, provided that in computing the latter the mass points 
of zero mass are ignored. Since the mass points at ztoD are of 
zero mass (by the average-power constraint), those are ignored, 
and we conclude that C(P,P) is achieved by (at most) three 
finite mass point. For sufficiently large A (exceeding the 
largest of these mass points) the peak-power constraint is 
inactive. 

B. Quantizers for Three-Mass-Points Input Distributions 

Having established that for any quantizer T) the capacity 
C(P, V) is achieved by a three-mass-points input distribution, 
we now fix some arbitrary three-mass-points input distribu- 
tion Px concentrated at (^1,^2 7^3) and study the quantizer 



that maximizes the mutual information I{Px,Wti) corre- 
sponding to it."^ (Without loss of generality, we assume that 
Ci 7^ £.2, Ci 7^ ™d ^2 7^ ^3-) We will show that when Px 
is a three-mass-points input distribution, 

SUp/(Px,VFp) = sup /(Px,W^D(Ti,T2)) (80) 
•D Ti<T2 

where the quantizer I?(Ti, T2) is defined as 

2?(Ti,T2)={yGM:Ti<y<T2}, Ti < T2 (81) 
with 

I?(-oo,T2) ={yeR:y<T2}, T2 e K (82a) 

X>(Ti,oo) ={yGR:y>Ti}, Ti e R (82b) 

I?(-cx),oo) =R (82c) 

X'(-oo,-oo) = 2?(cx),cx)) = 0. (82d) 

(Here denotes the empty set.) Needless to say, the case 
Ti = T2 and the forms (82c) and (82d) yield zero mutual 
information and are thus uninteresting. 
Define 



{K,c^2,w3) e [0,1]^ 
Prfy e P I X 



LOl, 



C,),pcm} 



to be the set of possible channel laws that different quantizers 
can induce for the inputs (^1,^2,^3)' ™d define W to be the 
closure of the convex hull of W. With this notation 



sup I{Px,Wv)^ sup / {Px , W) 

■D W£W 

< sup_/(Px,W^) 



(83) 



where the second step follows because W C >V. Recall that 
an extreme point of W is a channel in W that cannot be 
written as a convex combination of two different channels 
in W. By the Krein-Milman theorem [24, Cor 18.5.1], every 
channel law W can be written as a convex combination 
of extreme points of W. Since mutual information is convex in 
the channel law (when the input distribution is held fixed) [10, 
Th. 2.7.4], it follows that on the RHS of (83) we can replace 
the supremum over the set W with the supremum over its 
extreme points. 

We next show that the extreme points of W correspond to 
quantizers of the form (81). Once we show this, it will follow 
that (83) holds with equality because these extreme points of 
W are in fact in W. This will prove (80).^ 

To prove that the extreme points of W are indeed the 
channel laws corresponding to quantizers of the form (81), 
we consider the support function of W [24, Sec. 13] 



sup 



{Ai UJi + A2 W2 + A3 UJ^} 



(84) 



^Every two-mass-points distribution can, of course, be viewed as a three- 
mass-points distiibution witli one of tlie masses being zero. 

'Note tliat W is the set of possible channel laws that different quantizers 
can induce for the inputs (§1 , C2, §3), provided that we allow for randomized 
quantization rules. It thus follows that (80) continues to hold if on the LHS, 
instead of maximizing over all deterministic quantizers D, we maximize over 
all probability distributions Pyiy ^ binary. 
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for A = (Ai,A2,A3) e M^. Since W is the closure of all 
convex combinations of the elements of W [24, Th. 2.3], the 
support function of W is the same as that of W and 



where 



/(A) 



supjAi a;i(I?) 



A2C^2(I?) + A3C^3P)} (85) 



where 



iJiiV) ^Pr{Y eV\X ^^e), ^=1,2,3. 
We rewrite (85) as 

1 



/(A) ^ sup 



(86) 



(87) 



where 



gx(il) = Aie 



+ A2e 2^ + Aae 2^ , ye 



(88) 



The integral on the RHS of (87) is maximized when V is the 
set 

(89) 



V^X)^{yeR:gx{y)>0}. 



The structure of 2?* (A) depends on the zeros of gx{-), which 
we proceed to study. 

Our study of the zeros of g\{-) depends on the signs of 
Ai, A2, A3 and on how many of them are zero. The case where 
Ai,A2,A3 are all zero is trivial, because in this case /(A) is 
zero irrespective of V. We will see that in all other cases the 
set V that achieves /(A) is unique up to Lebesgue measure 
zero. If exactly two A's, say Ai and A2, are zero, then the set 
V that achieves /(A) is either M or 0, depending on whether 
A3 is positive or negative. We next consider the case where 
exactly one of the A's, say A3, is zero. In this case 



gx{y) = Aie' 



{v-il)- 



+ A2e" 



2/e 



(90) 



which is either positive (if Ai > and A2 > 0), negative (if 
Ai < and A2 < 0), or has an isolated zero at 



y 



6 -a 



■ lOE 



(91) 



(if Ai and A2 have opposite signs). Consequently, if exactly 
one of the A's is zero, then the set V that achieves /(A) is 
either the entire real line, the empty set, or a ray, i.e., of the 
form (-00, T) or (T, 00), where T is the RHS of (91). 

We finally turn to the case where all the A's are nonzero. 
If they are all of equal sign, then /(A) has no zeros and the 
set V that maximizes /(A) is either the entire real line R or 
the empty set, depending on whether the A's are all positive 
or all negative. It remains to study the case where the A's are 
nonzero but not of equal sign. Changing the sign of all the 
A's is tantamount to multiplying gx{-) by —1 and therefore 
does not change the locations of the zeros, so we can assume 
without loss of generality that one of the A's, say Ai, is positive 
and that the remaining two A2 , A3 are negative. In this case 



gxiy) = Aie hxiv), V e 



(92) 



hx{y) 



A2 

Ai 
A3 

Ai 



e 2o-^ e 



e 2a2 g« ^2 



ye 



(93) 



Note that the zeros of gx(') are the same as the zeros ofh\{ ). 
Further note that hx{-) is a nonzero analytic function whose 
second derivative 



(a - il) 



A2 

Ai 
A3 
Ai 



72— 



e 2, 



e 2<t2 ■'2 



y e M (94) 



is strictly negative. Consequently, hx{-) — and hence also 
gx{-) — can have at most two zeros. (If it had three or more, 
then by Rolle's Theorem its derivative would have at least two 
zeros, and its second derivative would therefore have a zero in 
contradiction to (94).) If h\{-) has at most one zero, then the 
set T) achieving /(A) is either the entire real line, the empty 
set, or a ray. If it has two zeros, then T) comprises two disjoint 
rays or else a finite interval — either way, T) or its complement 
is a finite interval. 

We next show that for every A the quantization region 
achieving /(A) is unique up to sets of Lebesgue measure zero. 
Let 2?* (A) be the quantization region that achieves /(A), and 
let 2?i be any other quantization region. Then 

gx{y)<iy- / gx{y)dy 

V*(X) JVi 

gxiy)dy- / gxiy)dy 
v*{X)nVf Jvinv^ix)" 

> / gx{y)dy 

> (95) 

where the second step follows because for every y S 'D*{Xy 
we have gx{y) < 0; and the last step follows because for 
every y G 2?* (A) we have gx{y) > 0. (Here A'^ denotes 
the complement of the set A.) Furthermore, since the zeros 
of gx{-) are isolated, it is nonzero almost everywhere, so the 
inequalities hold with equality if, and only if, V* (A) n and 
Vi n 'D*{XY' have both Lebesgue measure zero. 

Because quantizers that differ on a set of Lebesgue measure 
zero induce identical channel laws, the uniqueness (up to sets 
of Lebesgue measure zero) of the set V achieving /(A) (for 
A 7^ 0) implies that for every A 7^ the tuple {uji,uj2,uj'^) 
that achieves /(A) is unique. 

We next note that, by [24, Th. 13.1], every (wi, aj2, 1^3) G W 
satisfying 

AiOJi + X2OJ2 + A3W3 < /(A), for every A 7^ (96) 

must be an interior point of W. Since an interior point cannot 
be an extreme point, it follows that every extreme point of a 
compact convex set achieves the supremum defining /(A) at 
some A 7^ 0. Furthermore, since for a given A 7^ the support 
function /(A) is achieved uniquely by channel laws that are 
induced by quantizers of the form (81) or their complement. 
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it follows that the extreme points of W are all achieved by 
quantizers of this form or their complement. Recalling that 
mutual information is maximized over W (for a given input 
distribution) at an extreme point, and noting that the mutual 
information corresponding to the quantizers V is the same as 
that corresponding to its complement, we conclude that — for 
any fixed three-mass-points input distribution — the supremum 
over all quantizers can be replaced with the supremum over 
all quantizers of the form (81), thus proving (80). 



C. The Supremum Defining C(P) Is Achieved 

Having established that to each quantizer the optimal input 
distribution is of three mass points, and having established 
that to each three-mass-points input distribution the optimal 
quantizer is of the form (81), we conclude that we can express 
C(P) of (6) as 



C(P)- sup /(p,W(Ti,T2|0) 

(p,«): E[X^]<P, 
Ti<T2 



(97) 



where (p,^) denotes the three-mass-points distribution of 
masses 

p = bi,P2,P3) e [0,1]^ 



and locations 



and where W(Ti, T2|^) denotes the channel law correspond- 
ing to the quantizer 2?(Ti,T2) and to the mass points 
^ = 1,2,3: 

W^(Ti,T2 I ^Pr(y e2?(Ti,T2) \ X = i,). (98) 

We next show that this supremum is achieved. 

By the definition of the supremum, there exists a sequence 
{ (Pij Ti.i, T2,i), i £ N} (where N denotes the set of 
positive integers) such that 



lim /(p„W(Ti,„T2,,|^J) -C(P). 



(99) 



By taking a subsequence (if needed), we may assume without 
loss of generality that p; converges to some p*, that 
converges to some ^* (whose components may be ±oo) and 
that Ti i and T2,i converge to T* and Tj, both of which may 
be zLoo. From the continuity of the cumulative distribution 
function of the Normal distribution, it follows that, whenever 
is finite, 

lim Pr(Ti,, < ^i,, + Z< T2.O 

i—>-oo 

= Pr(T| <Ce+Z < T*) (100) 

where we recall that Z is a centered Gaussian of positive 
variance a^. 

Since the mass corresponding to nonfinite locations 
is zero (by the average-power constraint), and since pi i 
converges to p^, (100) and the continuity of the binary entropy 



function allow us to infer that 

lim /(p,;,W(Ti.„T2,,;|C,)) 



lim<;iJf,| > 'p^,^W{Tl^^,T2A^L^) 



^Pl,^Hb{W{T,^,,T2ML^)) \ 

e=i J 

= /(p^w(T^T*|r)) (iod 

which combines with (99) to imply 

/(p*,W(Tl,T*|r)) =C(P) (102) 

provided that in computing the mutual information on the LHS 
of (102) the mass points of zero mass are ignored. Noting that 
the mass points at ±00 are of zero mass and therefore ignored, 
we conclude that C(P) is achieved by an input distribution of 
(at most) three finite mass points and by a quantizer of the 
form (81). 

D. A Threshold Quantizer Is Optimal 

Having established that C(P) is achieved by a three-mass- 
points input distribution and a quantizer of the form (81), we 
now prove that C(P) is in fact achieved by a three-mass-points 
input distribution and a threshold quantizer, i.e., a quantizer of 
the form (82b). Clearly Ti and T2 cannot be both nonfinite, 
as this would result in zero mutual information, whereas C(P) 
is strictly positive whenever P is positive 



C(P) > 0, P > 0. 



(103) 



(This can be verified by noting that a symmetric threshold 
quantizer and an equiprobable ±\/P input distribution yield 
positive mutual information for every positive P, cf. (21).) For 
the same reason we can assume, without loss of optimality, that 
Ti 7^ T2. Since (82a) is the complement of a set of the form 
(82b) — which gives rise to the same mutual information — it 
remains to rule out the case where Ti and T2 are both finite. 

We shall prove this by contradiction. We shall assume that 
the quantization region I?(Ti, T2) for some finite Ti < T2 is 
optimal and derive a contradiction to optimality. Assume then 
that Ti and T2 are both finite with Ti < T2. Define 

^^Ti+T2 

Let ^ be the mass points of the capacity-achieving input 
distribution, and let p be the corresponding probabilities. 
There is no loss in optimality in assuming that 9 is nonnegative 



61 > 



(105) 



because if 9 is negative, then we can consider the input (p, — ^) 
(whose second moment is identical to that of (p, ^)) and the 
quantizer 2?(— T2, — Ti) (whose midpoint is of opposite sign 
to that of I?(Ti,T2)) which give rise to the same mutual 
information as the input (p,^) and the quantizer I?(Ti,T2). 

We continue by noting that the symmetry of the Normal 
distribution implies that 

W{Ti,T2\9 - 5) =W{Ti,T2\9 + 5), 5>Q. (106) 



13 



Indeed, denoting Ti = 

(T2 - Ti)/2) we have 

W^(Ti,T2 \ 0-6) = 



A and T2 = ^ + A (hence A 



Ti VZTTCT^ 



■ dy 



V2t 



1 



dy 



dr 



-A V^TUJ^ 

= W{Ti,T2\e + 5) (107) 
where we made the substitution t = — y + 29. Since 6' > 0, 



<5)2 < (61 + 5)2, 5>0. 



(108) 



As we next argue, (106) and (108) imply that there is no loss 
in optimality in assuming that 



?i < 6 < 6 < 



(109) 



Indeed, suppose ^3 > 6. Then it can be written as 9 + S, 
for some i5 > 0. However, £,3 = — 5 gives rise to the same 
channel law (106) but has a smaller cost (108). Thus, for every 
^3 > 6* we can find a ^3 < 6^ satisfying the power constraint 
that achieves the same rate. 

We next show that (109) leads to a contradiction by consid- 
ering a perturbation of the quantizer. For every F > T2 define 
the perturbed quantization region 



(Ti,T2)U[r,+oo) 



(110) 



and denote the channel law corresponding to V and ^ by 
WpIO: 



W{'D I £e) = Pr(y eV\X = £e 
= W{T,,T2\£e) + Q 



r-£i 



(111) 



for r > T2 and £ = 1,2,3. We will contradict the optimality 
of the input (p,^) and the quantizer I?(Ti,T2) by showing 
that for (p, ^) satisfying (109), we can find a sufficiently large 
r exceeding T2 such that 



for < p < 1 — e and some remainder R(p, e) sastisfying 

^2 1 



|Rb,e)l< 



2 p(l-p-e)■ 



(116) 



With this, we obtain 
/(P,W(P|0) 



ff6(P(Ti,T2)) +P(r)log 

3 



l-P(Ti,T2) 
P(Ti,T2) 



3 



+ K(p,tr) 

/(p,W(Ti,T2|^)) +P(r)log 

3 



l-P(Ti,T2) 



P(Ti,T2) 



W{Ti,T2\£e) 
+ K(p,^,r) (117) 

where 

K(p,|,r)^R(p(Ti,T2),p(r)) 



J2pe^ l^(Ti,T2 I C,),g 



(118) 



Since the LHS of (111) is strictly smaller than 1 so is its RHS 
and it thus follows upon averaging over p that for every P > 
and every Ti < T2 < F 

P(Ti,T2) < l-P(F). (119) 

Furthermore, P(Ti,T2) is strictly positive since 
W{Ti,T2\£i) > for £ = 1,2,3. Using (116), it thus 
follows that 

|R(P(Ti,T2),P(F))| 



lim 

r^oo 



/(p,W(P|0) >/(p,W(Ti,T2|0). (112) < lim 



[P(F)]2 



To show this we use (111) to express the mutual information 
on the LHS of (112) as 



r^- q(^) 2P(Ti, T2)(1 - P(Ti, T2) - P(F)) 



/(p,W(P|0) =i^6(P(Ti,T2)+P(F)) 



< lim 



r-6 



-J2p^hJw{t,,T2 \£i)+Q{ 

where we denote 

3 

P(Ti,T2) ^EwM^(Ti,T2|C£) 



1=1 



(113) 



(114a) 



(114b) 



r^oo 2 P(Ti, T2)(l - P(Ti, T2) - P(F)) 
= (120) 

where the second step follows because ^1 < ^2 < ^3' which 
implies that 

and where the last step follows because P(F) and 
Q((F — £,3)l(j) both tend to zero as F tends to infinity. Along 
the same lines, it can be shown that 



A Taylor series expansion of Hi,{p + e) around p yields 



Hb{p + e) = Hb{p) + e\og 



1-p 



R(p,e) 



(115) 



lim 

r->-oo 





(w{T^,T2\£i),Q 




) 


Q 







= 0. (121) 
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It thus follows from (118), (120), (121), and the Triangle 
Inequality that 

|A"(q,er)| 

lim 7 ^ 

|R(P(Ti,T2),P(r))| 



< lim ■ 



Q 



lim 

r-i-oo 



Rfiy(Ti,T2|$,),0(^) 



Q 



= 0. (122) 
We further have by [14, Prop. 19.4.2] that for £ = 1,2 



lim 



Q 



r-6 



,2_,2 



< lim 

r->oo g ( r-^3 ^ r~>oo T - i 



^ 0. (123) 
We thus obtain from (114b), (117), (122), and (123) that 
/(p,W(P||)) -/(p,W(Ti,T2|0) 



lim 

r-i-oo 



, l-P(Ti,T2) , l-MKTi,T2U3 

P3 log ^ ^ P3 lo, 



P3 log 



P(Ti,T2) 
l-P(Ti,T2) 



M^(Ti,T2 |6) 
M^(Ti,T2 I 6) 



lo: 



l-VK(Ti,T2 I 6) ■ '° P(Ti,T2) J 
> ' (124) 

where the inequality follows by noting that 

e^M/(Ti,T2 1 

is strictly increasing on {—oo,9) (see Appendix II), so^ 

W^(Ti,T2 I 6) >^^(Ti,T2). (125) 

It follows from (124) that, for a sufficiently large F, 
/(p,W(2?||)) is strictly larger than /(p, W(Ti, T2II)), 
contradicting the assumption that I?(Ti,T2) with finite 
Ti < T2 achieves C(P). 

E. Centered, Variance-P Input Distribution 

We have shown that the supremum in (6) is achieved by 
some input distribution that is concentrated on at most three 
points and by some threshold quantizer: 



c(p) = /(p*,w(T*ir)) 



(126) 



where g R'^ is the location of the mass points, p* is 
their corresponding probabilities, T* is the threshold of the 
quantizer, and W(T*|^*) is the resulting channel law. We next 

'Note that W{Ti, T2 | 6) = P(Ti, T2) if, and only if, pi = pa = 0. 
However, tliis would imply that C(P) = 0, P > in contradiction to (103). 



show that the input distribution (p*, ^*) must be centered and 
must satisfy the average-power constraint with equality: 

3 

Y,P*iQ=^ (127a) 

1=1 

3 

Y.p*,{QY = y. (127b) 



To show this we note that, for a fixed threshold quan- 
tizer T*, the capacity as a function of the maximal-allowed 
average-power is a concave nondecreasing function that is 
strictly smaller than 1 bit per channel use, and that it tends 
to 1 bit per channel use as the maximal-allowed average-power 
tends to infinity. Consequently, this capacity-cost function 
must be strictly increasing and the second moment of (p*, 
must therefore be P. This argument also proves that C(P) must 
be strictly increasing in P (because it is achieved by some 
threshold quantizer). Consequently, (p*,^*) must be centered 
because otherwise we could shift ^* and T* by the mean and 
thus reduce the second moment without changing the mutual 
information. 

VIII. Proofs: Capacity Per Unit-Energy 

A. Proof of Theorem 2 

We will lower-bound the RHS of (15) by restricting the 
supremum to threshold quantizers (2) and thus demonstrate 
that 



(7(0) > 



1 

2^' 



(128) 



Together with the upper bound (20), this will prove Theorem 2. 

To prove (128), we first note that a threshold quantizer 
induces the channel 

P{Y X = x) = Q\'^^^] , xGM (129) 



and P{Y = Q \ X 
we thus obtain 



C(0) > sup 



1-PiY = 1 I X = x). By (15), 



— log^(x7 



e 



log 



sup 

Cyio,Tei 



i-Q(^) log 



e 



) -(130) 



We now change variables by defining = ^ — T and by 
replacing the supremum over (C, T) with the supremum over 
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/i). This latter supremum we lower-bound by taking ^ to 
infinity while holding ^ fixed. This yields for the last two 
terms on the RHS of (130) 



lim 

^— >CX3 







and 



lim 1 . i ^ 0, 



(131) 



(132) 



We use the upper bound on the Q-function (46) to lower-bound 
the first term on the RHS of (130) as 

1 



Q log 



lim ■ 

^— >-oo 



> 



Q 



lim 5log(2^)+log^^^ 
1 



(7(0) > 



(133) 



(134) 



a J 2a2 

Combining (131)-(133) with (130) yields 

a) 2(72 

from which we obtain (128) by letting i^i tend to infinity. This 
proves Theorem 2. 

Note that (15) is achieved by binary on-off keying (29), 
see [11]. By showing that (15) is lower-bounded by 1/(2(t^) 
as we take ^ to infinity, we thus implicitly show that (7(0) 
is achieved by binary on-off keying where the nonzero mass 
point tends to infinity as P tends to zero. 

B. Proof of Theorem 3 

We first argue that in order to prove Theorem 3 it suffices 
to show that for every fixed v > 

D( Py\x=^ II Py\x=o) J_ 

^2 20-2 ■ 



sup 



Suppose then that this strict inequality holds for every 
V > 0. Consider a family of quantizers and input distributions 
parametrized by P with E [^2] < P. By [11, Eq. (15)], it 
follows that for every v > {) 



/ 



'Y\X=o) x'^ 



+ 



dPx{x) 



D{Py\x=. II Py\x=o) 



dPxix) 



< sup 



D{Py\x=^ II Py\x=o) \E[xn {^2 < z.}] 



sup 



D{P 



Y\X=i 



P^ 



Y\X=0) 



E[X2l{X2 > u]] 



sup 



^2 J P 

D{Py\x=^ II Py\x=o) ] E [X2 I {X2 < y}] 

e J p 



+ 2^^^P ''''' 

where the last step follows because the capacity per unit- 
energy can be achieved by binary on-off keying where the 
nonzero mass point tends to infinity (see Section VIII-A), so 



Dip. 



sup ■ 



Y\X=S, 



Y\X=0) 



1 

2^2 



= TT^- (137) 



Taking the limit as P tends to zero on both sides of (138) 
yields 

I{X-Y) 



lim ■ 

< lim 



' 1 E[X^l{X^>v]] 
2^ P 



r D{Py\x. 
sup < ■ — 



Py\x^^)\E[XH{X^<u]] 



< 



(138) 



1 

where lim denotes the limit inferior. Here the last step follows 
from (135) and from the average-power constraint 

E[X2I{X2 > I.}] E[X^l{X^ <v)] 



< 1. 



(139) 



P P 
Since the inequality in (135) is strict for every > 0, it follows 
from (139) that the last line in (138) can hold with equality 
for every > only if for every i/ > 

E[XH{X^ > ly}] 



lim ■ 

PiO 



1. 



(140) 



Thus, if (135) holds, then every family of distributions of X 
satisfying E [^2] < P that achieves 



lim 

PiO 



IiX;Y) 



1 

2^ 



(141) 



must be flash signaling, thus proving Theorem 3. 

Having established that in order to prove Theorem 3 it 
suffices to show that (135) holds for every > 0, we now 
proceed to do so. We first note that, for every ^ 7^ 0, the 
supremum in (135) over all quantizers T) can be replaced with 
the supremum over all threshold quantizers. Indeed, let 



{(c^i,c^2)e [0,i]2 
uji = Pr(y e V 
= Pt(y e V 



X^O),Vc 



denote the set of possible conditional probability distributions 
(Py|x=^(l), Py\x=o{^)) that different quantizers can induce. 
Applying the methods of Section VII-B, it can be shown that 
the extreme points of W correspond to threshold quantizers. 
(Recall that W denotes the closure of the convex hull of W.) 
Indeed, for binary inputs, the support function /(•) is given 
by (87) but with A3 = 0. The quantization region ^^{X) that 
achieves the supremum in (87) consists of the set of y £ E for 
which gx{y) in (90) is nonnegative. Since gx{-) has at most 
one zero, it follows that 2?* (A) consists of at most two regions. 
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i.e., it is a threshold quantizer. Since the relative entropy on the 
LHS of (135) is convex in {Py\x=o Py\x=o) [10, Th. 2.7.2], 
it follows by the same arguments as in Section VII-B that, for 
every ^ 7^ 0, D(^Py'\x=(^ \\ Py\x=o) is maximized by some 
threshold quantizer. 

We next note that we can assume, without loss of optimality, 
that the threshold T of the quantizer is nonnegative, so the 
supremum over V can be replaced by a supremum over 
threshold quantizers of nonnegative thresholds T > 0. Indeed, 
for x e R 

Pr(f > T I X = .t) = 1 - Pr(y > -T \ X = -x) (142) 
and consequently. 



D{PY\X=i 



^V|X=o)|p^{ygI{: y>T} 



- D{Pyix=-^ II PY\x=o)\j,^[y^Ts., v>-r 



(143) 



thus demonstrating that to every pair T) there corresponds 
another pair (— ^, — T) achieving the same relative entropy. 
Since ^ and — ^ have the same square, we can assume without 
loss of generality that T is nonnegative. 
We next define the random variable U as 



U ^Yl 



{f>0}. 



(144) 



Note that, for T > 0, the quantizer's output Y can be 
expressed as a function of U. It thus follows from the Data 
Processing Inequality for relative entropy [10, Sec. 2.9] that 



DP, 



Y\X=^ II Py\x=o 



< DP 



U\X=i 



Pi 



U\X={)) 



loe 



■dy 



dy 1 log 



dy 



dy 

(145) 



irrespective of the threshold T > 0. Here the last equality 
should be viewed as the definition of ^'(^). By applying the 
Log-Sum Inequality [10, Th. 2.7.1] to we obtain 



*(C)< 



1 



e 2<t2 log 



■dy 



(146) 



with equality if, and only if, 

e~ 2^2 ( p\ 

— = 2 Q ( - I , for almost every y < 0. (147) 

e~2^ ' 

Since (147) does not hold for ^ 7^ 0, this yields 

*(e)<^, e^o. (148) 

Note that (148) and (146) give an upper bound on the relative 
entropy that does not depend on the threshold. By combining 



(145) and (148), and recalling that for every ^ 7^ the relative 
entropy in (135) is maximized by some threshold quantizer, 
we obtain 



sup 

V 



D{PY\X=i 



py\x=^) m) ^ 1 

^2 ~ 20-2 



(149) 



for ^ 7^ 0. Maximizing over < v, this yields (135) by noting 
that the function ^ M- ^~2vl/(^) is continuous on R \ {0} and 
by noting that, as shown in Appendix III, 



lim 



This proves Theorem 3. 



C. Proof of Corollary 1 

To prove Corollary 1 we need to show that for every v > Q 
and every threshold quantizer with threshold < T < i^. 



sup 

J#0,0<T<iy 



D[PY\x=i II Py\x=o) 

e 



By (149) we obtain that for every ^ 7^ and every > 
D{PY\x=i II Py\x=o) ^ *(0 ^ 1 



sup 



< TT^ (152) 



o<T<. e - e 2a2 

where ^ ^"^^'(0 is continuous on R \ {0} and satis- 
fies (150). To conclude the proof of the corollary it thus 
remains to show that for every ly > 



— D{Py\x=^ II 
lim sup -y- 



Py\x=o) 1 

' ^' < (153) 



where lim denotes the limit superior. This can be done by 
noting that for < T < ly 



D{Py\x=^ II Py\x=> 




(154) 



where the second step follows because < Q{x) < 1, a:; e R 
and Hi,{p) > 0, < p < I; and where the last step follows 
because a; n- Q{x) is monotonicaUy decreasing in .t e R and 
because < T < ly. Computing the limiting ratio of the RHS 
of (154) to ^2 as ^2 tends to infinity yields for every 1/ > 



lim sup 

C2->.oo o<T<i/ 



D{Py\x=^ II -Py|j>s:=c 

e 







(155) 



thus establishing (153). This concludes the proof of the 
corollary. 
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IX. Proofs: Peak-Power-Limited Channels 
A. Proof of Proposition 1 

The peak-power-limited Gaussian channel with one-bit out- 
put quantization is a memoryless channel with a continuous 
input taking values in [— \/P, Vp] and a binary output. It thus 
follows from Dubins's Theorem that, for every quantization 
region T), the capacity-achieving input distribution is discrete 
with two mass points [23, Sec. II-C]. We shall denote these 
two mass points by and ^2- 

We next show that threshold quantizers are optimal. Let W 
denote the set of all possible channel laws, i.e., 

uji = Pr(y' e 2? I X = ^£),X> c m|. 

Applying the methods of Section VII-B to binary channel in- 
puts, it can be shown that the extreme points of W correspond 
to threshold quantizers (2) or complements thereof. (For more 
details, see also Section VIII-B.) By the same arguments as in 
Section VII-B, it follows that for every binary random variable 
X, the mutual information I{X]Y) is maximized by some 
threshold quantizer 

The capacity of the peak-power-limited Gaussian channel 
with one-bit output quantization is thus given by 

Cpp(P)= sup /(p,W(T||)) (156) 

(p,€),TeR 

where (p,^) denotes the two-mass-points distribution with 
masses p = (pi,P2) G [0,1]^ and locations ^ = (^1,^2) <= 
[— a/P, VP]^, and where W(T|4) denotes the channel law 
corresponding to the threshold quantizer (2) and to the mass 
points (Ci,6): 



W{T\U) = Pr(r >T\X = U 



1,2. 



(157) 



Following the steps in Section VII-C, it can be further shown 
that the supremum on the RHS of (156) is achieved. 

In the following, we demonstrate that there is no loss in 
optimality in assuming that the mass points of the capacity- 
achieving input distribution are located at — \/P and \/P- 
Indeed, suppose that the optimal mass points are located at 



-Vp<ei<6<\/p- 



(158) 



Then, it follows from the strict monotonicity of the Q-function 
that 



VP 



(159) 



Since M^(T|^i) does not depend on ^2, this implies that for 
every T and ^1, the channel law W(T|^) can be written 
as a convex combination of W(T|i/') and W(T|^), where 
■j/j = (CiiCi) and ^ = (^i,\/p). By the convexity of 
mutual information in the channel law, and by noting that 
/(p,W(T|t/j)) = 0, it follows that 

/(p,W(T|0) </(p,W(T|C)) (160) 

for every T and (p,^) satisfying (158). Thus, ^2 = VP 
achieves the capacity. By repeating the same arguments for 



^1, we obtain that the mass points of the capacity-achieving 
input distribution are located at — VP and \/P. We thus have 



Cpp(P) = maxCT(P) 

TeR 



(161) 



where Cx (P) is the capacity of the binary asymmetric channel 
with crossover probabilities 



w(o|i) = o 

w{m = Q 



^VP- T 

Vp + t 



(162a) 
(162b) 



For every T G K, the capacity of the binary asymmetric 
channel can be computed as 



Ct(P) = log(^l + e-^j +eW{l\Q)- Hb{W{l\Q)) (163) 
where 



1 - i^(o|i) - i^(i|o) ■ 

Combining (163), (162a), and (162b) with (161) yields 



Cpp(P) = max|log(l + e-^^P''^: 



where 



e(p,T) 



(164) 




1 - 



(166) 



Proposition 1 follows then by noting that the RHS of (165) is 
symmetric in T e M. 



B. Proof of Proposition 2 

It was shown in the previous section that the capacity 
is achieved with a threshold quantizer and a binary input 
distribution having mass points at ^/V and — \/P- Thus, the 
capacity can be expressed as 




(167) 



for some probabilities < p+ < 1 and < ]3_ < 1 satisfying 
p+ + P- = 1. Here we have introduced A = \f? to simplify 
notation. 
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Expanding Hb{-) as a Taylor series around Q(T/o'), we 
obtain for the first term on the RHS of (167) 

Hblp+Qi^—^] +P-C 



T + A 



= Hb\Q[-]] +log 



l-Q 


ii) 


Q 







^, T-A\ ^fr + A 
p+Q[ +P-Q 



2Q(?)[i-Q(i 

'T - A 



P~Q 



T + A 



Rh(A,T,p+) 



(168) 



where 



(169) 



for some p £ [Q((T + A)/cr),Q((T - A)/cr)]. Expanding 
the Q-function as a Taylor series around T/cr yields 



^/T-A\ /T + A 



{P+ - 



A 1 

I ^ 

o- \/27r 



e-i^ +Rq(A,T,p+) (170) 



where 



Rq(A,T,p+) 



A^ f 



e 2,1 



for some x e [T - A, T + A]. Note that 

|iexp(-.T/(2f7^)) I < cr/Ve 
so the remainder satisfies 



(171) 



|Rq(a,t,p+)| < 



2crV27re 



< P+ < 1. (172) 



Combining (170) with (168), we obtain for the first term on 
the RHS of (167) 



m\p+Q[^-^]+P-Q^^^^ 



P+Q[ +P-Q 



2Q(i)[l-Q(i)] 

A 1 

(P+ -P-)-^^e"^ +Rq(A,T,p+) 
Rh(A,T,p+) 



1 2 



log 



A^ 



^,T-A\ ^/T + A 



ip+-P-f 



4.Q(X)[l-Q(X)] 
K(A,T,p+) + R^^(A,T,p+) 



(173) 



where 



2(P+ - #Rq(A, T,p+) 



K(A,T,p+)^- 



2Q(i)[i-Q(i)] 
|Rq(a,t,p+)|' 



2Q(i)[i-Q(i)] 



(174) 



Taylor-series expansions for the last two terms on the RHS 
of (167) follow directly from (173) by setting p+ to 1 or to 
0. Thus, applying (173) to (167) and using that + = 1 
yields 



Cpp(P) = max 



A' 



[i-{p+-p-r 



r>o\a^ 4.Q(X)[i_Q(X)] 

+ K(A,T,p+) + Rh(A,T,p+) 
^P+[K(A,T,1) + Rh(A,T,1)] 

- p_ [K(A, T, 0) + Rff (A, T, 0)] | . (175) 

As shown in Appendix IV, we have 

limsup \^h{A,T,p+)\ 0<p+<l (176a) 

MOr>o A^ 

lim sup H^^^l^^ = 0, 0<p+<l. (176b) 

Using (176a), (176b), and the Triangle Inequality, (175) can 
be upper-bounded by 

Cpp P < sup — — TTTTTn TTTtvT + > ^^^^^ 

where limA^o o(A^)/A^ = 0. Consequently, dividing (177) 
by P = A^ and computing the limit as P tends to zero, yields 

"[l-(P+-P-n 



ft. P -f>Pa^-4.Q(X)[i_Q(X)] 

1 

-T>'o4.Q(l)[l-Q(X)]^ 



(178) 



where the second inequality holds with equality for 
p+ = p_ = 1/2. 

It remains to show that the maximum on the RHS of (178) 
is attained for T = 0. To this end, we argue that the function 



/(T) 



, T > 



(179) 
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Figure 3. The function u i— > g{u) for < "U < 2. 



is monotonically decreasing in T > 0. Indeed, the first 
derivative of /(•) is given by 



/'(T) 



T 



T > (180) 



where 



g{u) ^ 2uQ{u)[l ~ Q{u)] 



/27r 



[l-2Q{u)] (181) 



for M > 0. For u > 2, we bound the Q-function as [14, 
Prop. 19.4.2] 



3 1 



4 V27r-u 
to obtain 



< Q{u) < 



1 



2ttu 



u>2 



(182) 



> 



/27r 
1 



e 2 



/27r 

pa 0.1941 • e"^ 
> 0, u>2. 



2 1 

TT 47i 

2"l 



(183) 



Here the second step follows because i cxp(— 7i-^/2) is mono- 
tonically decreasing in it > 2. For < u < 2, it can be shown 
numerically that g{u) > 0; see Figure 3. 

It thus follows that g(T/a) > 0, T/cr > and hence, by 
(180), /'(T) < 0, T > 0. Consequently, 



niax/(T) = /(G)=4 
which together with (178) yields 

liin^<^. 



(184) 



(185) 



For = p- = 1/2 this holds with equality, thus proving 
Proposition 2. 



X. Proofs: Fading Channels 

A. Proof of Theorem 4 

We will lower-bound the RHS of (65) by restricting the 
supremum to radial quantizers 

V={y&C:\y\>T], T > 0. (186) 

and thus demonstrate that 

1 



C(0)> 



(187) 



Together with the upper bound (66), this will prove Theorem 4. 

To prove (187), we first note that, conditioned on {H, X) = 
(/i, x), the squared magnitude of ^j2/a^Y has a noncentral 
chi-square distribution with 2 degrees of freedom and noncen- 



quantizer induces the channel [25, Sec. 2-E] 



trality parameter [25, p. 8]. Consequently, a radial 



Prfr ^l\H = h,X^x] = 



At 



for /i G C, x e C, and T > 0, where •) denotes the 

first-order Marcum Q-function [25, Eq. (2.20)]. For x ~ Q 
this becomes 



Pr(r = l\H 
This yields 



{h G C, T > 0). 



DP, 



Y\H,X=^ 



Y\H,X. 



HI 



=0 I Ph) 



T log. 



1 - 



m 



loe 



-f-2 



> E 



-log 2 



(188a) 



(188b) 



where (188b) follows because the second term in (188a) is 
nonnegative, and because the binary entropy function is upper- 
bounded by log 2. 

By applying (188b) to (65), we obtain 



C(0) > sup<^ 

T>0 



4t 



-^log2 



(189) 



We lower-bound the supremum on the RHS of (189) by 
choosing T = for some fixed < /i < 1 and by taking 

1^1 to infinity. We then lower-bound the first-order Marcum Q- 
function using [25, Sec. C-2, Eq. (C.24)] 



> 1 



exp 



exp 



(a + /3)2 



(190) 
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for a > (5 > 0. This yields 



C{0) 



> 



1 



lim , ,^ 

l?Koo2|^|2 

lim 



> 



l?|->oo2|e|2 



cxp 
exp 



a2 



■(1 + Ai)^ 



m™oo2|e|2e(l-M)2 



(191) 



where the second step follows because < xe~"^ < l/{ea) 
for every x > and a > 0. This establishes (187) because H 
is of unit variance and /i can be arbitrarily close to 1. 

B. Proof of Theorem 5 

By the Data Processing Inequality for relative entropy, the 
relative entropy on the RHS of (68) is upper-bounded by the 
relative entropy corresponding to the unquantized channel, i.e., 
[3, Eq. (64)] 



Y\X=£, II Py\X=0 



< 



1 



iog( 1 + 4t 



(192) 



Consequently, the capacity per unit-energy (68) is strictly 
smaller than l/cr^ unless the supremum on the RHS of (68) is 
achieved as |^| tends to infinity. It thus remains to show that 



lim sup ■ 

|J|->oo x> 



D B 



Y\X=^ 



Pi 



y|x=o; 



< 



(193) 



To this end, we first note that, for every C 7^ 0, the supremum 
in (193) over all quantizers V can be replaced with the 
supremum over all radial quantizers (186). Indeed, for every 
quantization region satisfying 



Pr(F = 1 
the relative entropy 

D{PY\X=i II P 

= 13 log 



X = e) = /3, < /3 < 1 



Y\X=Qj 
1 



Pr(y = 1 

(1 - /?) log 



X = 0) 



1 -Pr(y = 1\X = 0) 



HbW) (194) 



IS a convex function of Pr(y = I \ X = O). Thus, 
for every < /3 < 1, the RHS of (194) is maximized 
for the quantization region that minimizes (or maximizes) 
Pr(y = 1 I X = 0) while holding Pr(y = 1 \ X = ^) = l3 
fixed. By the Neyman-Pearson lemma [26], such a quantization 
region has the form 



V* 



y e 



fim 



< A 



A > 



(195) 



(or the complement thereof), where f{y\x) denotes the con- 
ditional density of Y, conditioned on A" = x, and where A 



is such that Pr(y G | AT = ^) = /3. (Note that for every 
< (3 < 1 there exists such a A since, for the channel model 
(62), Pr(y' G I AT = ^) is a continuous, strictly increasing 
function of A > 0.) The likelihood ratio on the RHS of (195) 
is readily evaluated as 



1 



lei 



2\ lil^ 



(196) 



from which we obtain that (195) is a radial quantizer with 
threshold 



T = cr 



1 



loa 



1 



l?P ' 



A 



(197) 



Thus, for every ^ 7^ 0, the relative entropy 
P^{PY\x=i\\PY\x=o) is maximized by a radial quantizer For 
such a radial quantizer, we have 

^2 

(198) 



Pr(y = 1 I a: = x) = exp|^- 
for X E C and T > 0. Consequently, 



D{PY\X=i 



Py\x=o) 



log- 



1 - e 



log- 



Y2 
(7^ 



1 - 



log 1 



(199) 



where the second step follows because Hi,{-) > and 
exp(-TV(|^|2 + CT^)) > exp(~TVcr2); and the third step 
follows because —a; log a; < < x < 1. 

The first term on the RHS of (199) is maximized for 
= ICP + which yields 



ICI = 



T > 0. 



(200) 



— e 151^+-^ < 

(T^ e cr^ e 

The RHS of (199) is thus upper-bounded by 

If P 2 

D{PY\x=i II ^'yix^o) < ^ + -■ 

Dividing the RHS of (201) by j^p, and computing the limit 
as 1^1 tends to infinity, yields 



(201) 



D{PY\x=e II P 
lim sup — ^ ■ —7^ 

ICHoo V ICh 

This proves Theorem 5. 



Y\X=0) 



< 



1 



1 



2 < — . (202) 



XI. Summary and Conclusion 

It is well-known that quantizing the output of the discrete- 
time, average-power-limited Gaussian channel using a sym- 
metric threshold quantizer reduces the capacity per unit-energy 
by a factor of 2/tt, a loss which translates to a power 
loss of approximately 2dB. We have shown that this loss 
can be avoided by using asymmetric threshold quantizers 
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with corresponding asymmetric signal constellations. We have 
further shown that the capacity per unit-energy can be achieved 
by a PPM scheme. For this scheme, the error probability can 
be analyzed directly using the Union Bound and the standard 
upper bound on the Q-function (46). We thus need not resort 
to conventional methods used to prove coding theorems, such 
as the method of types, information-spectrum methods, or 
random coding exponents. 

The above results demonstrate that the 2dB power loss 
incurred on the Gaussian channel with symmetric one-bit 
output quantization is not due to the hard decisions but 
due to the suboptimal quantizer. In fact, if we employ an 
asymmetric threshold quantizer, and if we use asymmetric 
signal constellations, then hard-decision decoding achieves the 
capacity per unit-energy of the Gaussian channel. 

The above results also demonstrate that a threshold quan- 
tizer is asymptotically optimal as the SNR tends to zero. We 
have further shown that this is not only true asymptotically: 
for every fixed SNR, we have shown that, among all one-bit 
quantizers, a threshold quantizer is optimal. 

We have also shown that the capacity per unit-energy can 
only be achieved by flash-signaling input distributions. Since 
such signaling leads to poor spectral efficiencies, a significant 
loss in spectral efficiency is unavoidable. 

For Rayleigh-fading channels, we have shown that, in the 
coherent case, a one-bit quantizer does not reduce the capacity 
per unit-energy, provided that we allow the quantizer to depend 
on the fading level. We have further shown that this result is 
no longer true in the noncoherent case: here all one-bit output 
quantizers reduce the capacity per unit-energy. 



distribution. By the same lemma, it follows that also the 
mapping x t-^ Hi,(Pi{Y = 1\X ~ a;)) is continuous 
and bounded, so H{Y\X) is also continuous in the input 
distribution. We thus have the following lemma. 

Lemma 2: For every fixed quantizer T), the functionals 
H{Y), H{Y\X), and I{X]Y) are continuous in the input 
distribution under the weak topology. 

For proving the existence of a capacity-achieving input 
distribution we need a compactness result: 

Lemma 3: Let A > be fixed. Every sequence of prob- 
ability measures on the interval [—A, A] of second moment 
not exceeding P has a subsequence that converges weakly to 
a probability distribution on the interval [—A, A] of second 
moment not exceeding P. 

Proof: By Prokhorov's Theorem, every sequence of prob- 
ability measures on [—A, A] has a subsequence that converges 
weakly to some probability measure on [—A, A]. The second 
moment of this limiting probability measure cannot exceed P 
because the function a: i— > is a continuous bounded function 
on the interval [—A, A]. ■ 
Note that Lemma 3 continues to hold for sequences of 
probability measures on M of second moment not exceeding 
P, albeit with a slightly different proof. Thus, the amplitude 
constraint A is not essential. 

It follows from Lemmas 1-3 that the supremum in (77) 
defining Cx),a(P) is achieved. 



Appendix II 
Appendix to Section VII-D 



Appendix I 
Appendix to Section VII-A 

Lemma 1: Let 2? be a Borel subset of the reals, and let the 
sequence of real numbers {xk} converge to ^. Let Z he a 
zero-mean Gaussian random variable of positive variance cr^. 
Then 



lim Pr(xk + Z eV) = Pr(^ + Z eV) 



(203) 



Proof: Let /(•) denote the density of a zero-mean, 
variance-cr^ Gaussian random variable, so 



Pi(xk + zev) 



fiy - Xk)dy. 



Since the density of a zero-mean, variance-cr^ Gaussian ran- 
dom variable is continuous, and since the sequence {xk} 
converges to ^, it follows that the sequence of densities 
y f{y ~ ^k) converges to y i-^ f{y — ^). The result follows 
then by noting that, for every k, 

PT{xk + Z e K) = Pr(C + Z eM) = 1 

and from Scheffe's theorem [27, Th. 16.12]. ■ 

From Lemma 1 we conclude that x H' Pr{Y = 1\X = x) is 
continuous. Since it also bounded, it follows that Pr(y = 1) is 
continuous in the input distribution under the weak topology. 
Since the binary entropy function is a continuous bounded 
function, this implies that H{Y) is continuous in the input 



We show that, for ^ < 6*, the function ^ ^ Ty(Ti, T2 | C) 
is strictly increasing. To this end, we note that 



iy(Ti,T2 \0=Q 



(204) 



and take the derivative with respect to ^. (Recall that 6 
(Ti + T2)/2 and A = (T2 - Ti)/2.) This yields 



d_ 



I^(Ti,T2 I 



V27rcr2 



V27rcr2 



1 (e-o^ + A^ 



>o, e< 

thus proving the claim. 
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Appendix to Section VIII-B 



To show that 



lim 



- - 

2o-2 I2 



(205) 



(206) 
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we write ^'(^) as 
1 



\p2/Ka^ Jo 



g( i)iog 2q(| 



CT/ \/27rcr2 



- 1 



\/27rcr2 



(207) 



and compute the limiting ratio of each term on the RHS of 
(207) to ^2 as ^ tends to zero. For the first two terms, we have 



lim 



1 

4^ 



and 



lim 



= 0. 



(208) 



(209) 



To evaluate the last term on the RHS of (207), we express 
^ i~> (3(^/0-) as a Taylor series around zero 



Q 



With this, we obtain 



g(|)iog 2q(| 



2 V2^ 



(210) 



V27rcr2 



1 



2 



VW2 

1 e 



2 



V2^ 

f2 1 

where the second step follows because 

log(l + x) = X - ia-2 + o{x^) 

Consequently, 

Q(f)log('2Q( 



(211) 



lim 



1 1 



(212) 



The claim follows then by combining (208)-(212) with (207). 



Appendix IV 
Appendix to Section IX-B 



A. Proof of (176a) 

To prove (176a), namely 

|Rh(a,t,p, 



lim sup ■ 

a;ot>o 



= 0, < p+ < 1 



we fix some v > 1 and analyze the cases < T < 1/ and 
T > V separately. Since we are interested in the limit as A 
tends to zero, we will assume that A < 1. 
If < T < zy, then p in (169) is bounded by 

q(^)<P<0(-^) (213) 

which, by the assumption that A < 1, implies that p is bounded 
away from and 1: 

1\ / 1~ 



Q 



<p< 



(214) 



Consequently, combining (170) with (169) and using the 
Triangle Inequality yields for <T < i/ 



|R^(A,T,p+)| 

A \p+-p-\ 



< 



< 



< 



2<T 

A 



1 



2(7 x/2^ 



/27r 



e 2a 



A^ 



e"i^ + |Rq(A,T,P4 



|Rq(a,t,p+)| 

1 3 



j52(l — j5)2 



p2(l -p) 



2(T\/2ir 2(T2\/2ire 



(215) 



Here the second step follows by upper-bounding |1 — 2p| < 1 
and \p+ — p-l < 1; and the third step follows from (172) and 
(214) and by upper-bounding exp(— T2/(2cr2)) < 1. Since 
the RHS of (215) does not depend on T, this yields 

|Rh(a,t,p+ 



lim sup 

A4.0o<T<^ 



A' 



0, 0<p+<l. 



(216) 



For T > i^, we first upper-bound (171) 

A 



Rq(A,T,p+) < 



A^ T 



(T-A) 



2^2 V2^ 
A^ T 



< -T , e-^^ (217) 

where the first step follows by upper-bounding x < T + A and 
cxp(-x2/(2cr2)) < cxp(-(T - A)2/(2cr2)); and the second 
step follows because T > ly and A < 1, so A < T. Combining 
(217) with (169) yields for T > 



|Rff(A,T,p+)| 
A b+- 



< 



< 



2cr 
A 



-e 2^ 



1 3 



|Rq(A,T,p4 



1 



A" 



< 



2(7 V2^ 
(27rcr2)i^ 



(^2 V27rCT2 

A A^" 



2 (t2 J p2(^l 



p2(l — p)2 
1 

p2(l — p)2 

(218) 



23 



where the first step follows from the Triangle Inequality; the 
second step follows from (217) and because — P- \ < 1 
and |1 — 2p| < 1; and the last step follows because 

exp(-TV(2(72)) < Tcxp(-(T- 1)V(2ct2)). 

We next note that, since T > ly > A and < A < 1, we 
have 



~ ^,'T-A\ 1 
^^^1^ <2 



(219) 



and 



P>Q 



T + A 



(T + A)V V2^(T + A) 

^2 ' 



/2^(T + 1) 



(T+ A)^ 



T > 1/ 



(220) 



where the second step follows from [14, Prop. 19.4.2]. Con- 
sequently, using (219) and (220), the RHS of (218) can be 
upper-bounded by 

|Rff(A,T,p+)| 



< 



Y3 



A I A-" 

2 ^ CT^ 



(T+l)3 



17^ I 27r(T+l) 



76 



A I A^ 

2 CT^ 



27rc75 ( 1 



X exp 



3(T-l)2 (T + 1) 



2a2 



Since the function 



T^T^^(T + l)2cxp 



2^2 + —2 



is bounded mT > v, it follows that 

|R^f(A,T,p+) 



lim sup 

AiOx>i/ 



A" 



0, 0<p+<l. 



(222) 



Combining (216) and (222) proves (176a). 

B. Proof of (176b) 

To prove (176b), namely 

limsupH^illM.o, 0<,,<1 

AiO T>0 A^ 

we fix some v > 1 and analyze the cases Q < T < v and 
T > V separately. As in the previous section, we will assume 
that A < 1. If < T < i^, then we have 



(223) 



which yields for every < p+ < 1 and every A < 1 

|K(A,T,p+)| 



A 2(p+-p-) ^ 



^Rq(a,t,p+) + |Rq(a,t,p+)| 



2Q(i)[i-Q(l)] 



< 



Afci^|R^(A,T,p+)| + |RQ(A,T,p+)|' 



< 



1 



A-^ 



A^ 



< T < I/. 



(224) 



(t3 27rVe 4cr427re_ 

Here the second step follows from (223), from the upper 
bound exp(-T2/(2(T2)) < l, T e M and from the Triangle 
Inequality; and the third step follows from (172) and because 
|p+ — V- \ ^ 1- Consequently, 
|K(A,T,p+)| 



lim sup 

A^O o<T<i 



A^ 



0, < P+ < 1. 



If T > I/, then we have [14, Prop. 19.4.2] 



V27rT 
and, by (217), 

We thus obtain for X > v 



< 



(225) 



(226) 



A' 



T 



T > ly. (227) 



|K(A,T,p+)| 

A 2(p+-p-) , 



/27r 



■Rq(a,t,p+) + |Rq(a,t,p+)| 



T > I/. (221) < 



27rTe2^ 



2Q(i)[i-Q(i)] 

A2\p+-p_\ _r_ 



< 



2nTe^ [ A'' T 
1 T2 t 



Rq(A,T,p+)| 



|Rq(a,t,p+) 



A'' (T-i)2 
+ ^n: -e 



< 



TT 1 - i 

V i 

2" 1 



(72 
Y2 



(T-l)^ 



A T 

2a 

Y2 

2^ 



1 + ::^ 



1 



ct2 

A^ 

3 



(7 

A^ 

a3 



(228) 



where the second step follows from (226) and from the 
Triangle Inequality; the third step follows from (227) and 
because \p+ — P-\ < 1; the fourth step follows by upper- 
bounding exp(-T2/(2tT2)) < exp((T - 1)2/(20-2)); and the 
last step follows because T > i/ and A < 1, so A < T. 
Since the function 

^2 



T2 r, 

T H- > — ^e2<T 



1 



2(72 



is bounded in T > j^, it follows that 

limsupH^illM.o, 0<p,<l. 
a;ot>^ A^ 

Combining (225) and (229) proves (176b). 



(229) 



24 



Acknowledgment 

The authors wish to thank Paul P. Sotiriadis, who sparked 
their interest in the problem of quantization. They further wish 
to thank Tamas Linder, Alfonso Martinez, and Sergio Verdu 
for enlightening discussions and the Associate Editor Young- 
Han Kim and the anonymous referees for their valuable 
comments. 

References 

[1] R. H. Walden, "Analog-to-digital converter survey and analysis," IEEE 

J. Select. Areas Commim., vol. 17, no. 4, pp. 539-550, Apr. 1999. 
[2] A. J. Viterbi and J. K. Omura, Principles of Digital Communication and 

Coding. McGraw-Hill, 1979. 
[3] S. Verdu, "Spectral efficiency in the wideband regime," IEEE Trans. 

Inform. Theory, vol. 48, no. 6, pp. 1319-1343, June 2002. 
[4] C. E. Shannon, "A mathematical theory of communication," Bell System 

Techn. /, vol. 27, pp. 379^23 and 623-656, July and Oct. 1948^ 
[5] R. G. Gallager, Information Theory and Reliable Communication. John 

Wiley & Sons, 1968. 
[6] T. Koch and A. Lapidoth, "Increased capacity per unit-cost by oversam- 

pling," in Proc. IEEE 26th Com. of Electrical and Electronics Eng. in 

Israel, 2010, pp. 684-688. 
[7] , "Increased capacity per unit-cost by oversampling," Sept. 2010. 

[Online]. Available: http://arxiv.org/abs/1008.5393 
[8] E. N. Gilbert, "Increased information rate by oversampling," IEEE Trans. 

Inform. Theory, vol. 39, pp. 1973-1976, Nov. 1993. 
[9] S. Shamai (Shitz), "Information rates by oversampling the sign of a 

bandlimited process." IEEE Trans. Inform. Theory, vol. 40, pp. 1230- 

1236, July 1994. 

[10] T. M. Cover and J. A. Thomas, Elements of Information Theory, 1st ed. 

John Wiley & Sons, 1991. 
[11] S. Verdii, "On channel capacity per unit cost," IEEE Trans. Inform. 

Theory, vol. 36, pp. 1019-1030, Sept. 1990. 
[12] J. Singh, O. Dabeer, and U. Madhow, "On the limits of communication 

with low-precision analog-to-digital conversion at the receiver," IEEE 

Trans. Commun., vol. 57, no. 12, pp. 3629-3639, Dec. 2009. 
[13] S. Graf and H. Luschgy, Foundations of Quantization for Probability 

Distributions, ser. Lecture Notes in Mathematics. Springer Verlag, 

2000, vol. 1730. 



[14] A. Lapidoth, A Foundation in Digital Communication. Cambridge 
University Press, 2009. 

[15] J. M. Wozencraft and I. M. Jacobs, Principles of Communication 
Engineering. John Wiley & Sons, 1965. 

[16] R Zhang, F. M. J. Willems, and L. Huang, "Investigations of noncoherent 
OOK based schemes with soft and hard decisions for WSNs," in Proc. 
49th Allerton Conf. Comm., Contr and Comp., AUerton H., Monticello, 
II, Sept. 28-30, 2011, pp. 1702-1709. 

[17] A. Lapidoth and S. Shamai (Shitz), "Fading channels: how perfect need 
'perfect side-information' be?" IEEE Trans. Inform. Theory, vol. 48, 
no. 5, pp. 1118-1134, May 2002. 

[18] A. Mezghani and J. A. Nossek, "On ultra- wideband MIMO systems with 
1-bit quantized outputs: Performance analysis and input optimization," 
in Proc. IEEE Int. Symposium on Inf. Theory, Nice, France, June 24—29, 
2007, pp. 1286-1289. 

[19] , "Analysis of Rayleigh-fading channels with 1-bit quantized out- 
put," in Proc. IEEE Int. Symposium on Inf. Theory, Toronto, Canada, 
July 6-11, 2008, pp. 260-264. 

[20] , "Analysis of 1-bit output noncoherent fading channels in the low 

SNR regime," in Proc. IEEE Int. Symposium on Inf. Theory, Seoul, 
Korea, June 28 - July 3, 2009, pp. 1080-1084. 

[21] S. Ki'one and G. Fettweis, "Fading channels with 1-bit output quantiza- 
tion: Optimal modulation, ergodic capacity and outage probability," in 
Proc. Inform. Theory Workshop (ITW), Dublin, Ireland, Aug. 30 - Sept. 
3, 2010, pp. 1-5. 

[22] T. Koch and A. Lapidoth, "One-bit quantizers for fading channels," in 
Proc. IZS, Zurich, Switzeriand, Feb. 29 - Mar. 2, 2012, pp. 36-39. 

[23] H. S. Witsenhausen, "Some aspects of convexity useful in information 
theory," IEEE Trans. Inform. Theory, vol. 26, no. 3, pp. 265-271, May 
1980. 

[24] R. T. Rockafellar, Convex Analysis. Princeton University Press, 1970. 
[25] M. K. Simon, Probability Distributions Involving Gaussian Random 

Variables: A Handbook for Engineers and Scientists. Kluwer Academic 

Publishers, 2002. 

[26] J. Neyman and E. Pearson, "On the problem of the most efficient test 

of statistical hypotheses," Phil. Trans. R. Soc. Land. A, vol. 231, no. 

694-706, pp. 289-337, Jan. 1932. 
[27] P. Billingsley, Probability and Measure, 3rd ed., ser Wiley Series in 

Probability and Mathematical Statistics: Probability and Mathematical 

Statistics. John Wiley & Sons, 1995. 



